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(57) A method for determining the presence of a 
face from image data includes a face detection algo- 
rithm having two separate algorithmic steps: a first step 
of prescreening image data with a first component of the 
algorithm to find one or more face candidate regions of 
the image based on a comparison between facial shape 
models and facial probabilities assigned to image pixels 
within the region; and a second step of operating on the 
face candidate regions with a second component of the 
algorithm using a pattern matching technique to exam- 
ine each face candidate region of the image and thereby 
confirm a facial presence in the region, whereby the 
combination of these components provides higher per- 
formance in terms of detection levels than either com- 
ponent individually. In a camera implementation, a dig- 
ital camera includes an algorithm memory for storing an 
algorithm comprised of the aforementioned first and 
second components and an electronic processing sec- 
tion for processing the image data together with the al- 
gorithm for determining the presence of one or more fac- 
es in the scene. Facial data indicating the presence of 
faces may be used to control, e.g., exposure parameters 
of the capture of an image, or to produce processed im- 
age data that relates, e.g., color balance, to the pres- 
ence of faces in the image, or the facial data may be . 
stored together with the image data on a storage medi- 
um. 




750 




100 



120 



PERFORM FACE 

DETECTION 

ALGORITHM 



STORE FACE 
LOCATION DATA 



POINT FOCUS MECHANISM 
TO FACE REGION OF SCENE 



180 



PROVIDE COMPOSITION 
AIDS FOR PHOTOGRAPHER 



1J2 




DETERMINE OPTIMAL 
FLASH SETTING 



DETERMINE OPTIMAL 
EXPOSURE SETTING 



FIG. 3 



a, 
uj 



Printed by Jouve. 75001 PARIS {FR) 



EP 1 128 316 A1 



Description 



20 



25 



30 



35 



40 



45 



50 



55 



o 0 l°l o T f h p e nr Sent '"T' 10 ! 1 iS in ' he fie ' d ° f ima9e Cap,Ure ' and in particular in the field of ^age processing for the 
purpose of enhancing and optimizing the process of image capture by a camera 

[0002] A preponderance of images collected by photographers contain people, which are often the most important 
ubjects of the images. Know.edge of the presence and .ocation of people in an image, and espedally ,h pTsence 
and location of the, faces, could enable many beneficial improvements to be made in the image capture process 
Some are suggested in the prior art. For example, automatic and semi-automatic focusing cameras ortXi* a PoS 
of he scene on which to adjust for best focus. If the camera could locate the faces in a scene "en oc s could ™ 
op m,zed for the faces unless the photographer explicitly overrides that choice. In U.S. Patent No 5 835 616 a IS 

sub ect matter, n that connection, the process disclosed in the '616 patent automatically finds a human face in a 
d.g,.,zed ,mage taken by a digital camera, confirms the existence of the face by examining facial feal^s and then has 

SS^ST T °" d6,eC,ed ,aCe - DeteC,i0 " ° f 3 feCe alS0 * e ' ds strongtv d nee oS 

^ItZZ^ n^T: area F0f r mPte - " U Pa ' ent N °- 5 ' 430 ' 809 3 Vide0 autonomous" 
racks a facal target ,n order to set a measuring frame on the facial object for purpose of auto exposure and auto focus 

n , ZZT ZT^ a Z^1' an aut0 whi,e balance sys,em adjus,s colors to obt - op*- 

(from U S Patent N 5 62 %Tu h , ?T * Y T * ^ * ^ Mi ° SKin co,or balance ' " is als ° known 
. Tfadatreatn to de ^ P ' '° ^ reDresentin 9 color and/or density of 

^Z^f&Z^ZT " SUCh ^ COrreSp0 " d ^ * «» «« «=an be p Jed 
[0004] While face detection has been studied over the pas. several years in relation to the subject of image under- 

mnHiiinn £ T^k 6 bee " deV ' Sed that Show tenable performance over a range of imaging 

conditions. Such methods may be more successfully implemented in large scale processing equipment such as dTo 

22* P ", ' r iCh ^ re ' atiVely S ° PhiS,iCated pr ° Cessin 9 <Wed to a handScamera The 

chaHenge ,s to implement these face detection methods reasonably in a camera with limited memo^resources and 
wrth low computational cost. If this can be done successfully, the detection of faces in a scene wi.Uhen sTrve as a 
ZToT^T* T impr0Vements in the im age capture process. In addhion. it would be sZo deted 
up/down orientation for subsequent printing (for example, of index prints) 
iTaoesforlh^ 

00061 Thl n re W ? 5 ^ * °* image CaptUre pr0CeSS and imDr o vin 9 O^lity in the captured image 

nzed, according to one aspect of the present invention, a method for determinino the Dresenee of a faro frL im*™ 
^«a incudes a face detection algorithm having two separate algorithmic ^?C^SS 

o1^ontSr„J?» and faCial Pr °' abilitieS aSSi9n6d ,0 ima9e Pixels within the *** and seS s of 
operating on the face candidate regions with a second component of the algorithm using a pattern matching technique 

omh n T 6a r h H 06 Candid3te re9i0 " ° f ,he im39e and thereb * 3 facia- presence in the 

- camera implementation, a digital camera includes an algorithm memory for storing an algorithm com 
pr.sed of the aforementioned first and second components and an electronic processing section for orocessTna the 
image data together with the algorithm for determining the presence of one or more faces fn he 

ro'dut n T" ° f T 5 be US6d 10 C ° n,r01 - •»• exposure * the capture Z a image or o 

produce processed image data that relates, e.g.. color balance, to the presence of faces in the image oMhe faciS 
data may be stored together with the image data on a storage medium 

E, ln . another aspect of the invention, a digital camera includes a capture section for capturing an image and 
producing image data; an electronic processing section for processing the image data to determine me p eZce of 
one or more faces ,n the scene; face data means associated with the processing section for gene alg See data 
corresponding to attributes of at leas, one of the faces in the image; a storage medfum for storing , he image £ and 

medi^ 

TooO] .natr er ^ 

EL or aeteZX T 3 di9ital Camera inC ' UdeS 30 a,9 ° rithm memorv stori "9 3 face detection 

algorithm for determining the presence of one or more faces in the image data and a composition aloorithm for sua 
gestmg composition adjustments based on certain predetermined composition principles^ 
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section for processing the image data together with the algorithms for determining the presence of one or more faces 
in the scene and their relation to certain predetermined composition principles. The processing section then generates 
face data corresponding to the location, orientation, scale or pose of at least one of the faces in the image as well as 
composition suggestions corresponding to deviation of the face data from the predetermined composition principles. 
[0009] In yet a further aspect of the invention, a hybrid camera is disclosed for capturing an image of a scene on 
both an electronic medium and a film medium having a magnetic layer. The hybrid camera includes an image capture 
section for capturing an image with an image sensor and producing image data; means for capturing the image on the 
film medium; an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; face data means associated with the electronic processing section for generating face data corre- 
sponding to at least one of the location, scale and pose of at least one of the faces in the image; and means for writing 
the face data on the magnetic layer of the film medium. 

[0010] In still a further aspect of the invention, the present invention includes a camera incorporating a digital image 
sensor, a central processing unit such as a microprocessor, a means of detecting a face, and a means of displaying 
the location of detected faces to the photographer. The advantage of each aspect described is that the camera can 
use the face detection capability to improve the picture taking experience for the user, as well as provide numerous 
suggestions to the photographer to obtain better and more pleasing photographs. 

[0011] These and other aspects, objects, features and advantages of the present invention will be more clearly 
understood and appreciated from a review of the following detailed description of the preferred embodiments and 
appended claims, and by reference to the accompanying drawings. 

[0012] FIG. 1 is a block diagram of a face detecting camera showing an arrangement of camera elements in accord- 
ance with the invention. 

[0013] FIG. 2 is a block diagram of the image capture section of the camera shown in Figure 1. 

[0014] FIG. 3 is a flowchart diagram of camera operations involved in the operation of the camera shown in Figure 

1 in a framing image mode. 

[001 5] FIG. 4 is a flowchart diagram of camera operations involved in the operation of the camera shown in Figure 
1 in a final image mode. 

[0016] FIG. 5 is a flowchart showing the generation of composition suggestions. 

[0017] FIG. 6 is an illustration of an image area divided into a grid for application of the rule of thirds. 

[0018] FIGS. 7A, 7B, 7C and 7D are examples of the shape models for frontal and right semi-frontal poses used in 

one of the face detection algorithms. 

[0019] FIGS. 8A and 8B show graphical displays of probability densities for skin, which are used in one of the face 
detection algorithms. 

[0020] FIGS. 9A and 9B show graphical displays of probability densities for hair, which are used in one of the face 
detection algorithms. 

[0021] FIGS. 1 0A, 1 0B and 1 0C show an original image and its reconstruction following principal component analysis 
in accordance one of the face detection algorithms. 

[0022] Because imaging systems employing electronic and film capture are well known, the present description will 
be directed in particular to attributes forming part of, or cooperating more directly with, systems and apparatus in 
accordance with the present invention. System attributes and component apparatus not specifically shown or described 
herein may be selected from those known in the art. In the following description, a preferred embodiment of the face 
detection algorithm would ordinarily be implemented as a software program, although those skilled in the art will readily 
recognize that the equivalent of such software may also be constructed in hardware. Given the system and methodology 
as described in the following materials, all such software implementation needed for practice of the invention is con- 
ventional and within the ordinary skill in such arts. If the face detection aspect of the invention is implemented as a 
computer program, the program may be stored in conventional computer readable storage medium, which may com- 
prise, for example; magnetic storage media such as a magnetic disk (such as a floppy disk) or magnetic tape; optical 
storage media such as an optical disc, optical tape, or machine readable bar code; solid state electronic storage devices 
such as random access memory (RAM), or read only memory (ROM); or any other physical device or medium employed 
to store a computer program. 

[0023] Referring now to the block diagrams of Figures 1 and 2, a camera 10 is shown as an integrated system 
embodying the components of a standard camera, including an image capture section 20, a processor or central 
processing unit (CPU) 30, a digital memory 32 for storing captured images and associated annotations related to the 
images, and a display device 34 for displaying captured images and/or other data useful in operation of the camera. 
The capture section 20 includes an optical section 21 having autofocus capability for focusing an image 22 (including, 
for purposes of this description, one or more faces) upon an image sensor 23, such as a conventional charge-coupled 
device (CCD). An exposure control mechanism 24 includes an aperture and shutter for regulating the exposure of the 
image upon the image sensor 23. Instead of (or in addition to) an electronic capture device, the capture section may 
include an analog storage device 25, such as a conventional photographic film. In the case of well-known APS film, 
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H h inpfnn U »h S ° TeC °" > " )9 layer ' 3 reCOrdin9 device 26 can record anno,a,ion data yarding the captured 

Scient. ° yef ' A " aSh Unit 27 iS a ' S0 Pr ° Vided f ° r il,Uminatin9 the ima9e 22 when ambient m is 

2 J he CPU 30 is interconnected via a system bus40 to a random access memory (RAM) 42. a read-only memory 

n? 2 ^^rr 2 T^ 

svstem m a H |^ 27 to the bus 40), a communication adapter 48 (for connecting directly to an information handling 

rame 49 T"*" ^ 35 ,n,eme,) ' 3 ^ traCki " 9 8,996 49 < for a ™"9 

outton 52 flaS h root Vr eS)> 3 USef ' nterfaCe adaPt6r 50 <f ° r C ° nnectin9 user interface devices sucb a * a butter 
button 52 flash controls 54, programmed exposure selections 56. a user manipulated display cursor 58 and/or other 

ser interfacedevices to the bus 40). a algorithm interface adapter 60 (for connecting various stored algorithms to he 

e ic 34] The CpS » S T ? a r ,hm ?! ^ 3 "*» ? ° ^ bU8 40 ,0 the dis P' a 

ZZl , L ^ ' S suff,clent| y P° werful and "as sufficient attached memory 42 and 44 to perform the face 

^SSTSi fiT*" t^ 386 ?2 ' C ° nneC,ed ,0 ^ tUS 40 ' C ° n,ainS SUffiCient ,raini " 9 da,a 10 enab ' a 
bP riptri^ L 9 , TL , W ° rk f ° r 3 VefV Wide ra " 9e ° f ima9in9 condi,ions - ,n ,he P referred embodiment, as will 
» ha £2? ? ' 6 f3Ce deteC ' i0n 3 ' 90rithm inC ' udeS tW0 component al 9° rithms: a first component algorithm 
that est mates a face candidate region of the image based on a comparison between facial shape models and facia. 
probab.lrt.es ass.gned to ,mage pixels within the region and a second component algorithm operative on the face 
TT'"? Pa,, T ana,yS ' S t0 eXamine 6aCh re9i ° n ° f ,he ima9e and theret >y confirm a 'acia. presence in 

I' ° r 3 f 6 P ° S1,IVeS 3nd ,he SeC ° nd COmponent a '9° rithm can «» ™re computational? 
.ntensive process.ng to the relatively few regions that have passed the first algorithm 

iE reSU " S ° f de ' eCti0n US6d ,0 COn,r01 3 number of func,ions of ,he camera, which are embodied 

bv Urn , m \ COnne ( ° n ' he d3ta bUS 40 thr ° U9h ,he in,erface adap,er 60 - The face diction results are tracked 

TH T* L r rf 49 ' S6tS and maniPU ' a,eS the measuring frame 49a t0 track - e -9- ^ centroid of one 
or more face locations. The measuring frame is used as described in U.S. Patent No. 5.430,809 which is incorporated 

a.anZth f° "T d ?K C0 " eC,ed PUrP0S6S ° f aU,0, ° CUS ' aU '° eXp0SUre ' aut ° c °'°' Glance IndTuTo Se 
nTrlT 7 TV? meaSUri " 9 frame may be 3 Sma " Spo, -' ike area or * ™* be configured to have borders 

Tthe Inr'th 7 T f, * " ^ ^ either CaSe " * intended '° »» data collected 

for the algorithm to face data or some sample thereof. These algorithms include a red eye correction algorithm 80 

a red eye condrtio .produced by the flash unit 27. The exposure control algorithm 82 determines settings from the 
that the faces are properly exposed. In conjunction with the exposure control determination, the flash algorithm 84 

ataoZ Ttotir;? ,o "h r e f flash ,or op,ima, cap,ure ° f the fadai ima9es - The — ^ JwZ 

o tica. sect o 21 uTn n t ? r'T meaSUrin9 frame 493 3nd t0 561 9 P0intab,e focus mecbanism i" » ba 
reo ions Tte coir hi' k «» " 9 im39e S ° th3 ' 3 fin3 ' C3PtUred image is properi V focused °" tha 'ace 

S reoiln 2l 5 9 ^ 88 " 3PP " ed ,0 digit3 ' image file h 0rder ,0 optimize the representation of the 
m^ 9 t. meaSUnng fr3me 499 80 that ,hey match the e *P ected co1 ^ range of skin tones, 

ast imaal r^r^ fP'^^ 06 34 e " ab,eS 3 P hoto 9 ra P h ^ P^view an image before capture and/or to view the 
rlt Zl * , w f ' 3n ° PbCal Vi6Wfinder 28 iS pr0Vided for previewin 9 an ima 9 e - Moraover, the CPU 30 
t? a „~ f f C i'° n 3l90ri,hm 10 hi9h,i9W faC6S Within ,he viewed image if needed - Forthis purpose, a semi- 

transparent hqu.d crystal display (LCD) overlay 29 may be provided in the optical viewfinder 28; an LCD drUr 29a 

iTtolTcPuZ 5 ? t L ? ° Ver,ay , 29 COrre8p0ndin 9 1° one or more face locations in response to face location 
?IrpZ I I JZZ J"" LC ° m3Sk 18 diSCl08ed in U S - Patent No - 5 ' 103 ' 254 - which ^ incorporated herein by 

TZT^ e 0 lT , can 9e r ate high,ighted or ou,lined ,aces by driving ,he p attem 9enerator 74 - 

fhplir r t ^ Yl e ' 9 - 3 b ° X ° Vef 3 f9Ce in 9 Viewin9 area shown ° n tbe dis P'ay ^vice 34. Furthermore 
e faces can be marked by a photographer by moving the cursor 58 on the viewing area of the display device 34 so 

he LCD S 3 V T S r b ° X 3r ° Und 3 ,3Ce - Thl ' 8 C0U ' d 9,80 be done ,hrou 9 h the LC D driver 29a and 
tne LCD overlay 29 in the optical viewfinder 28. 

rnSp a An ( ° the [- 3d ,y 3nt39e ° f the PreSen ' inVen,i0n is that data associa,ed with the dete c«on of faces in an image 
ooudbeautomahcalyrecordedandincludedwithorasanannotationofanimag 

of signtont subjects within a photographic record of events without requiring the annotation to be done by the pho- 

S 6 ° f , ima K 96 3CqUiSiti ° n ° r 3t 3 ,3t6r time - The deteC,ion of faces in the scene then opens the way 
lZ» Tr TT enbancemen,s to t b e image capture event and to subsequent processing of the image. For 
example, face detection w.ll prov.de a convenient means of indexing images for later retrieval, for example by fetching 
TP ™ T T ° r m ° re P60Ple 38 SUbjeC,S - C ° nse 9 u ently. running the face detection algorithm provides face 
data corresponding to one or more parameters such as location, orientation, scale and pose of one or more of the 
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detected faces. In addition, once faces have been detected, a simple face recognition algorithm can be applied to 
identify faces from a small gallery of training faces that the camera has previously captured with help from the user 
and stored in a training data base. The results of face detection and location are stored in an auxiliary data location 
attached to the image, which are together stored in buffer memory in the RAM 42. Images are annotated, for example, 
with the coordinates of detected faces, estimates of face size, positions of the eyes, a rough estimate of the pose 
parameters of the head, and the identity of each individual. Once an image is selected for storage, the image and its 
annotation data can either be stored together in the digital image memory 32 or the annotations can be stored in the 
magnetic layer of the analogue image memory 25 (the image would be stored as a conventional latent image on the 
photographic emulsion). In one configuration for a digital camera, the captured image data is recorded in the storage 
medium in digital folders dedicated to images with a particular number of faces in the scenes. 
[0028] It should also be understood that a further embodiment of the invention is a hybrid camera which simultane- 
ously captures an image of a scene on both an electronic medium, such as the image sensor 23, and a film medium, 
such as the APS film 25. In this embodiment, the CPU 30 processes the image data from the image sensor 23 to 
determine the presence of one or more faces in the scene, and face data is generated corresponding to the location, 
scale or pose of at least one of the faces in the image. Such face data could be displayed to the user of the camera 
on the display 34 in order to evaluate the captured image. If the face data (or image) would suggest a problem with 
the captured image, the user would have the opportunity to recapture the image on another frame of the film 25. 
Additionally, the face data could be written on the magnetic layer of the film medium 25 by activation of the recording 
unit 26. 

[0029] As shown in the diagrams of Figures 3 and 4, respectively, the camera operates first in a framing mode and 
then in a final imaging mode. In each mode, the camera offers a number of automated features to assist the photog- 
rapher. The photographer has the option of disabling the framing mode through the user interface adapter 50, thereby 
disabling acquisition of the framing image and going directly to the final imaging mode. 

Framing Mode 

[0030] In the framing mode shown in Figure 3, the camera 10 obtains a framing image in step 100 by activation of 
the capture section 20. The CPU 30 then performs the face detection algorithm 90 in step 120, by which it attempts to 
detect any faces in the framing image and indicate their locations to the photographer in the viewfinder 28 or on the 
display device 34. More specifically, the face detection algorithm utilizes face training data from the training database 
72 to find faces. If faces are detected in the decision block 130, then face location data is stored in step 140 in the 
RAM 42 for subsequent utilization by one or more of the camera algorithms. Furthermore, the facial locations are 
processed by the display interface 70 and, e.g., the faces produced on the display device 34 are outlined with a box 
or some other kind of outlining feature. If the face detection algorithm 90 is unable to find any faces, this fact is reflected 
in the outcome of the decision block 130. Thus, in response to a face detection failure, i.e., when no faces are found, 
the photographer can return to the beginning via path 132 and slightly repose the scene and allow another chance at 
detection, or can choose in a manual decision block 134 to provide manual detection input to the camera using the 
cursor 58 to manually locate a face in the viewfinder 28 or on the display 34. Other input techniques can be used, for 
example, a touch sensitive screen and stylus (not shown). Then, armed with knowledge of face presence and face 
location in the framing image, the camera 10 is able to provide valuable services to the photographer that can be used 
to improve the final captured image. Such services include focus assistance, exposure and flash determination and 
composition aids, as follows. 

[0031] Focus assistance. Many modem cameras provide automatic focusing or user-designated focusing using a 
focusing aim point in the view finder. Since people, when they appear, are usually the most important subject in images, 
it is reasonable to properly focus the image on the faces of people unless directed otherwise by the photographer 
Systems are presently known (see, e.g., U.S. Patent No. 5,749,000, which is incorporated herein by reference) which 
include multiple focus detection areas and a steerable selection mechanism which selects one of the areas in response 
to an input stimulus (such as a voice instruction). Alternatively, as shown in the aforementioned '809 patent, autofocus 
can be performed within a measuring frame that is set to include a face. In connection with the present invention, after 
performing face detection on the framing image, the camera 10 engages the focus control algorithm 86 in a focus step 
150 to use its steerable auto-focusing system in the optical section 21 to select a particular focus detection area that 
will focus the image optimally for the preponderance of the faces in the scene. (Alternatively, the focus could be set 
optimally for the largest face in the scene, which is presumed to constitute the primary subject.) 
[0032] Exposure and flash determination. The camera 10 provides automatic exposure control and flash engage- 
ment through its exposure control algorithm 82 and flash control algorithm 84. A typical microprocessor-controlled 
exposure control apparatus is disclosed in U.S. Patent No. 4,503,508, which is incorporated herein by reference, and 
used for both ambient and flash exposure. The exposure control functionality provided by this patent can be confined 
to, or weighted for, a facial area located within the measuring window 49a described in relation to the aforementioned 
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Zen, £ 32 7 9 PrinCiP,6S ° f C ° mP0SitiOn ,hat reSUlt in Pleasin 9 ™^ ** example a sll"rr S ub S 

r n,n,erest : 9 print A,sa ,he " rule of thirds " ca,is for ,he ™ n * •» p.^" 2S 

T andtaln M P he Zf h > r " V6rt f ^ ho ™ nt ^ » both. Such princip.es are discussed in detail in Gril 
i. ano scanion, M., Photographic Com position Amphntr, a™^ 1990 

2£? H T , he r aC8 d ?! 6 "I 1 " 9 Camera 10 pr ° VideS ,he ^Potion-assistance mode 180 in which based on the results 
nonzoniai MM 180). If the lacs urn lo lie .long , eo™ taiionlal |„, (s , ep , 92 , „ e di M 



Final Image Mode 



icS , e STIL C h 3P I"' Pr ° CeSSing ° f the framin9 image as show " in 3, the camera is ready to 
^^SS^mTJT " ^-rt ^ Pr ° Vided ,he Ph ° ,0grapher with the aids ■«»*"«» * steps 50 
stepsToouo^ 

!17cUancy m o a ; cots t'^r'' 6 hUm3n V, ' SUa ' ^ dem ° nS,ra,eS marVe,ous ^ <° ^ntain percep- 

ature of uJSeS ? 11^ 1" T ^ be Perf0rmed by ,3king in, ° account some understanding of the 
,h» rlr r? J V 6 ' a " d ,he ' r relative N»rtance. If an entire scene be reproduced correctly and vet 

variations ,n skin tones among different persons and ethnic groups can be statistically categorized and 
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understood. Furthermore, it fortuitously happens that the natural variations of skin colors and the offensive errors in 
skin color reproduction tend to lay in orthogonal directions in properly selected color space representations. Skin colors 
vary along the blue-red dimension, while unacceptable reproduction errors primarily concern the green-magenta di- 
mension. 

[0038] Knowledge of the presence and location of faces in a scene can lead to improved color balancing in two 
different ways. If only global image correction is available (as in optical printing of analogue images), then the estimate 
of global illumination can be adjusted so as to result in a pleasing rendering of the skin tones of the faces. The face 
detecting camera, by recording the location and sizes of faces in the magnetic layer of the analogue film medium 25, 
enables later optical photofinishing equipment to optimize the color balance for proper reproduction of skin tones on 
the face. On the other hand, if a digital processing step is available, then the facial region can be corrected independently 
of more global considerations of illumination. This is the best possible scenario, leading to a better print than could be 
obtained by solely optical means, because both the primary subjects (people) and the background regions can be 
pleasingly reproduced. In either case, the camera 10 utilizes its color balance algorithm 88 in a face preferential cor- 
rection step 260 to provide optimal color balance for the image based at least in part upon the located faces. More 
specifically, the CPU 30 interacts with the measuring frame 49a generated by the tracking stage 49 to collect color 
data from the detected face(s) and then to weight the color balance algorithm 88 for the facial area. 
[0039] Red eye notification and correction. A red-eye detection algorithm 80, such as the one disclosed in com- 
monly assigned U.S. Patent No. 5,432,863, which is incorporated herein by reference, is run in the red eye correction 
step 270 against the final captured image. The detected presence of a face is used as additional evidence in the red- 
eye algorithm 80 to help prevent false positive errors. A pair of detected red-eyes should be corroborated by the re- 
enforcing evidence of facial presence. The existence of red eye can also be provided by the red eye detection algorithm 
to the display interface 70, which can designate an appropriate warning in the display device 34. After receiving red- 
eye notification, the photographer may choose to obtain another image. Or, the automatic red-eye correction algorithm 
can be invoked to remove the offensive red highlights in the eyes if the camera 10 is a digital camera. 
[0040] Orientation marking. Many consumer photo-finishing orders are now returned with an index print of small 
versions of each image in a sequence. The utility of the index print is diminished if the images are not all printed in the 
proper natural orientation. The presence of faces in an image provides a powerful cue as to its proper orientation. For 
instance, the facial dimensions can be separated into their principle axes, and the longest axis can be taken as the 
up-down axis; then one of the face detection algorithms to be described can distinguish the hair region and thereby 
infer an upright orientation. The majority of faces will be upright or close to upright, in the sense of overall image 
orientation. The face detection algorithm 90 in the face detecting camera 10 will determine orientation and tag each 
captured image on the image storage device 32 (or 25) with a notation of the proper orientation as suggested by the 
face orientation detected in the orientation step 280. 

[0041] Face labeling. Once faces have been detected, a simple face recognition algorithm can be applied to identify 
faces from a small gallery of training faces that the camera has previously captured with help from the user and stored 
in the training data base 72. The gallery could contain the individuals in a family, for example, or children in a school 
class. When a new image has been captured by the camera, and the faces detected, the identity of each individual, 
established by the- face recognition algorithm, can be recorded with the image in the digital storage 32 or magnetic 
layer of the film 25. Such face identity information flows with the image into photofinishing or subsequent computer 
processing. The information can be used to automatically label prints with the names of the people in the image. Other 
possible applications include automatically producing albums that contain images of a single person or a group of 
persons specified by the customer. For a typical face recognition algorithm, there are a number of commercially avail- 
able face recognition products available that offer software development kits, allowing their algorithms to be embedded 
as larger systems. For example, the "Face-It" system produced by Visionics Corp. would be suitable for use as a face 
recognition algorithm. 

Face detection algorithms 

[0042] A face detection algorithm that operates in a digital camera must meet the criteria necessary for success 
given limited computational and computer memory resources. That is, the algorithm must operate rapidly (say, in less 
than one second) and with sufficiently high performance in terms of true positive/false positive detection rates. Coun- 
terbalancing the limited resource base, the fact that the results of the algorithm will be presented to or used by a human 
operator implies that some tolerance will exist for algorithm failures. This tolerance is an enabling characteristic of the 
proposed invention. 

[0043] In this embodiment, we propose for usage a combination of two face detection algorithms whose joint usage 
provides higher performance in terms of detection levels than either algorithm individually. The first detector, component 
W, is a very fast pre-screener for face candidates. The second detector, component S, is a sophisticated pattern match- 
ing algorithm characterized by a very low rate of false positives. Face candidates labelled by component W will be 



EP 1 128 316 A1 



subsequently examined by component S to result in a final detection decision. 
The Component W 



0044 Wu et aL pub shed a face detection algorithm (hereinafter, as modified, the component W) that is well suited 

ZZvT "V? T m" 1 !^ <S6e WU ' H - Che "' °- and YaChida ' M - " Face Detec,ion fr om Color Images Using a 
Fuzzy Pattern Matching Method". IEEE Trans. Pattern Analysis and Machine Intelligence, 21(6), 557-563 1999 which 
is incorporated herein by reference). The algorithm is very fast and requires very small amounts of both program 

Z^Ti 1 ?'? S, ! ,e ' I"' COmPOnen ' W iS 3 klnd ° f ^ Pa,tem reC09nizer that searches for ™W windows 
hat seem likely to contain faces based on color characteristics. The method essentially looks for windows in which 

the central portion seems likely to contain skin, based on its color and shape; and surrounding regions (around the top 
and s.des of the skin) that seem likely to contain hair, again based on color and shape. Since the method is based on 
color signals, ,t requ.res that the imagery on which it operates be encoded in a meaningful color metric 
[0045] The component W has a training phase and a test phase. The training phase comprises the collection of skin 
and har color distributions, and the gathering of shape models from suitable training examples. In the test phase a 
window ,s scanned over the image and a complete range of scales and positions. The component W implicitly assumes 
that the upward orientation of the image is known, and that the faces are roughly aligned with the image orientation 
I'™ 7 ! re J aXed by Ca,Tyin9 ° Ut the en,ire face search several "mes-probably three, since the camera 

The follow no s ^n o P f V ° nCe P ° SSible ima9e ° riema,i0n - ln the ,eS < phase ' tne a, 9°ri'"- applies 

the following steps once for each image window to be examined: 

11 m in hair probabilit y ma P s - Comparison of each pixel in a digitized image with pre-determined 

probability tables of skin and hair colors, leading to a posterior, probability that the pixel represents human skin or 
hair. The probability tables must be collected off-line and stored in the camera. They are collected with the same 
imaging sensor as in the digital camera, using identical spectral sensitivities 

1 ™ P ( robab , ilitie t!: 0 eS " ma,ed area fractions via non-linearity. Face shape models are built from training 
examples, also off-line. These models encode the likelihood of. the occurrence of skin and hair colors in each cell 
o a rectangular grid overlaid on spatially normalized human faces in a small set of standard head poses 
3) Perform fuzzy pattern matching with face shape models. A rectangular window is scanned to each pixel 
position of the image ,n turn, and a judgment is made as to whether the window contains a face. To accommodate 

nfl 3 V^? 9 S ' ZeS ' SCanninQ Pr ° CeSS iS repeated wi,h windows var V ina over a range of sizes. The judgment 
of whether a face ,s present in a window of the image is based on a fuzzy comparison between the pre-determined 
face shape models and the actual distribution of posteriori skin and hair probabilities in each cell of the window 
The fuzzy comparison makes use of parameterized non-linearities, as described in the Wu et al. article that are 
adjusted in a calibration stage in order to provide the best results. 

i a 2o°I! h h S f $t . ePS are , n ° W deSCribed in m ° re detaN after introducin 9 ,he face sha P e m odels. It should also be un- 
derstood that extensive detail can be found by referring to the Wu et al. article 

[0046] Shape models. The head shape models are low-resolution representations of the spatial distribution of skin 
™«Z T T , P0SeS ' There iS ° ne m ° del f ° r Ski " and 0ne model for hair for each distinct pose. Each model 
Z for s in mZ ? 'T t' T 12 and 0=1 °'' Wi,h 6aCh Ce " enCOdin9 ,he fraction of the ce " that » occupied by 
co relnoSl ,h k' ° f , hair (fo / na ' r m0de ' S) f ° r typiCa ' heads in a 9iven pose - An imaae wi "dow can be spatially 
corresponded with the cells ,n the shape models. Depending on the window resolution, a single pixel or a block of 

fo mil" t°"T ?° h m ° del CelL The m ° delS W6re bUilt USin9 a set of bainin9 ima 9es to which affine trans- 

iTmen m 6 ^i* 6 " ^ ^ * ^ lhe tW ° ey6S in Standard P° si,ions - The s P atial| y normalized images 
were hen manually segmented into skin and hair regions, and the fractional cell occupancy, at the resolutions of the 

M°D The mnri°T PU t ?*T °' *" ShaP6 m ° de ' S ** M a ° d r ^ SemMal is *nown * Figures 
frain 9 d3,abaSe 72 (Sh ° W " in Fi9Ure 1 » Wi,h 9ray - |evel encooin9 of tne ooc" Pancy 

IndTair STJ!^ ^f^' ^ * "* ^ '» l ° ^ e probabilit y distributions «* 

7lT n«ll* f om . tra ' n ' n9 ,ma 9 es - Tha 9oal is to obtain probability tables of the form P(skin | color) and P(hair I 
™«„ I t USln9 , Farnsworth P^ceptually uniform color space as suggested in the Wu et al. article the 
ZrZlZOl US6 l ( ^ C0, ° r SPaCe 38 3 Preferred COl0f metric for distinguishing skin and hair regions, and 
k/p ,2 £ T Probability training and application in the (L.s.t) color metric, where L = c(R + G + BV s = a(R-bV t = 

Thfe 2! k 8 ' and C ar ! C ° nStantS: a0d R> G and B are image values Proportional to relative log scene exposure 
,h ? Pr ° Ven ? a " 6ffeC,iVe C °' 0r SpaCe in WhiCh 10 perform ima9e segmentation. While all three channels 
raLsi t T ma r , anne ' iS Separated fr ° m the COmbined chrominance channels in the probability histograms. 
[0048] To gather sk.n color statistics, an annotated database including some 1800 images was used each image 
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stored in 12 bit, relative log scene exposure RGB metrics. Human judged color balance and white point aims were 
available for these images, as well as the eye locations of all faces in the database with two eyes visible. Using an 
anthropometrically average face model, the skin pixels of the faces were extracted for all faces in the database. The 
color balance and white point aims were also subtracted from the images in each case. The pixel values were then 
converted to the (L.s.t) metric using the matrix computation: 
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where the hatted quantities have been adjusted for the aim values. To gather hair color statistics, an analogous process 
f5 was performed, with the exception that the hair regions were manually segmented for each example head. Each pixel 
of skin or hair results in an example of a color in the (L.s.t) space. Separate probability histograms were accumulated 
for both hair and skin. The L histograms were compiled separately from the two dimensional s,t histograms in each 
case. Thus, an implicit assumption is taken that the colors and luminance of skin and hair are independently distributed. 
At test time, an image is first processed by using the color values at each pixel to look up posteriori likelihoods that 
20 either skin or hair was imaged to that pixel. Bayes' theorem is applied to the probability distributions to ensure that the 
distribution of colors in world objects is taken into account. The result of this computation provides two graphical displays 
of the skin and hair probability density, as shown in Figures 8A and 8B, and 9A and 9B, respectively, where each 
graphical display represents two views of skin and hair pixel probabilities, respectively, separated between luminance 
and chrominance components. 

25 ' 

Map skin and hair probabilities to estimated area fractions. 



30 



[0049] The shape models contain information about the spatial distribution of colors in face images, while the prob- 
abilities computed in the previous step depend on the average color in candidate facial regions. Since they are different, 
the two categories of information cannot be directly compared. Therefore, an intermediate step is needed to map the 
probability values in an image window into estimated skin and hair occupancy fractions by the use of a non-linear 
mapping function. The non-linear function is a sigmoidal-type function with adjustable parameters a and b, and is given 
by the following equation. 
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The Wu et al. article claims to adjust the parameters a and b separately for each of the skin and hair models based on 
empirical experiment in order to produce the best face detection results. In the present invention, the mapping non- 
linearity was found to be most useful to compensate for the deficiencies in the statistical sampling of skin and hair 
colors. For this reason, the parameters a and b are set quite low. The goal is for the component W to almost never 
miss a face, counting on further processing by algorithm S to eliminate the many false detections that pass through 
the component W. 

Perform fuzzy pattern matching with face shape models. 

[0050] Given the shape model, with skin and area coverage fractions for each cell, and estimates of the same quan- 
tities for corresponding image regions coming out of the non-linear mapping, a judgment is made as to the similarity 
between the image regions and the model cells. The similarity measure uses a two term "fuzzy relation": 
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similarity^ /„, M s , M h ) = e - a ^ l s^M s ,M h ) b 
dist(l s J h ,M s ,M h )= J(l s -Mf + (i h 'M h ) 2 



Po™t w W 0 """"*' 10 *"">' <•<« by »» Com- 



ponent W. 
Component S 



[0052] The amplifications made to the distribution 

P(face I image) (1) 

(1) standardize face region size 

(2) decompose face region into subregions 

(3) ignore dependencies between subregions 

(4) project subregions to lower dimension representation using PCA 

(5) code projections using sparse coefficients 

(6) quantize sparse coefficients 

(7) decompose appearance and position 

(8) ignore position for uncommon patterns 

(9) vector quantize positional dependence for common patterns 

(10) apply (1)-( 9 ) at multiple resolutions, assuming independence between resolutions 

iSS^ r a,,y norma,i2ed faces wi " be presented in a 56x56 - — * 
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P(face | region) (2) 
where region is exactly a rasterized vector of pixels from a 56x56 pixel image window. 

2. Decompose face region into subregions. Each face region is decomposed into multiple overlapping 16x16 
pixel subregions. These subregions can be anchored at every pixel position in the region, or at a subset of these 
positions. We anchor subregions at every third pixel in every third line. With this choice, there are 196 possible 
anchor positions of a subregion within a face region; this position can therefore be encoded in a single byte. On 
the right hand side of (2), "region" can be replaced with "{subregion}", an aggregate of subregions. The subregion 
size is chosen so that individual subregions, when suitably positioned, are large enough to contain facial features 
(such as eyes, nose, or mouth). This size limits the largest image feature that can be examined as a coherent unit. 

3. Ignore dependencies between subregions. No attempt is made to model the statistical dependencies between 
subregions. This simplification therefore limits the type of object attributes that can be modeled. For example, while 
a single subregion can contain sufficient pixels to capture an eye, the other eye will fall into a different subregion, 
and there can be no consideration taken of similarity or dissimilarity between the two eyes. Nor can any reasoning 
be based on the relative levels of illumination of different parts of the face. Using this simplification, equation (2) 
can now be replaced with 



tt su bregi ons 

Jj P{face | subregion, ) (3) 

where the statistical independence is reflected in the lack of joint dependencies on multiple subregions. 
4. Project subregions to lower dimension representation using principal components analysis (PCA). Since 
subregions contain 256 pixels, with 256 gray levels each, the number of possible subregions is huge. The next 
simplification involves applying the standard technique of linear PCA to reduce the dimensionality of the subregion 
from 256 to twelve. (The choice of twelve dimensions is somewhat arbitrary. Upwards of 90% of actual subregion 
variance can be encoded using no more than twelve dimensions.) To perform the PCA, a large training set of face 
images was processed, with all subregions participating in the data analysis. Some experimentation was performed 
to see whether separate principal components are necessitated for different image resolutions and multiple face 
poses. Based on these findings, it was decided that distinct sets of principal components would be stored for each 
resolution, but that it was not necessary to keep different sets by face pose. Intuitively, it seems reasonable that 
at different resolutions the essential facial structures would exhibit unique spatial patterns, while the changes 
caused by slightly different facial poses would be less significant in the first few principal modes of variation. 

The result of the projection step is that each image subregion becomes represented by the twelve projection 
coefficients along the principal component axes. This representation amounts to representing each subregion by 
a linear combination of twelve principal subregions. The projection operation is carried out by a matrix operation 

lproj]=A T [subregion] (4) 

where A is the projection matrix whose columns contain the eigenvectors (principal components) of the training 
subregions. Note that the PCA operates on a training set of face images only. False (non-face) examples are not 
used since the resulting principal components would likely be subject to wide variability caused by statistically 
inadequate sampling of the very large set of possible non-face images. As a result of this step, expression (3) 
leads to 



50 tt subregion j 
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\lPiface\proh) (5) 



5. Code projections using sparse coefficients. Rather than retain all twelve projection coefficients, the subregion 
representation is further compressed by retaining only the six most significant. However, this sparse coding scheme 
is further complicated by grouping the last six coefficients pair-wise into groups and considering their sum square 
values when selecting the six projection dimensions to retain. In this way, twelve coefficients are reduced to six 
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for subsequent processing. 

«"«*»«•■ F ^her compression of subregion representation occurs through discrete quan- 

a, in p Tr USi " 9 3 UOyd " MaX qUantiZier - ThiS quantizier minimi2es the -ean-square quan.i- 
Z nnZ T r 6 r^T" " ° d ' M ° n ° f «» ^Pendent variabie. For common values of 

the number of quantization values, the bin breakpoints and the reconstruction levels of Lloyd-Max quantizers are 
SdiW of 1 r a - Tw °- D '™ h e " si °" al Signal and Image Processing, Prentice-Hall: New Jersey, 1 990. To test the 
2 war rJJ^T f T T ,he actual distribution of the projection coefficients of the training 

set were collected from which ,t was seen that the Gaussian assumption closely matches the actual distribution 

M rh ! « h nUmber ° f Sp3rSe coefficients retaine d <he number of quantization levels allocated to 
each coeff.cen. determines the number of possible quantization values that encode image subregions. Based on 

InlT T Pr TT n ' dimenSi ° nS ' Wi ' h Ch ° iCeS ° f 8 ' 4 ' 0r 2 t ' uantiza,ion levels for ^ch dimension, the 
algonthm as implemented can represent each subregion by one of approximately 1 ,000,000 numbers. These quan- 
tized numbers are somewhat inscrutably called »q1» values in the Schneiderman e. al. reference. The number of 
poss,ble q1 values ,s an algorithm sizing parameter referred to as "n.," in that reference 

The compress.on advantage of this quantization scheme becomes clear when it is seen that 256«6 possible 
subregion patterns are encoded in 1 0* distinct numbers. In fact, it is possible to consider this quantization scheme 

image. Figure 0 shows an original image and its reconstruction following PCA projection and sparse coding and 

r So so, ^ SP6C Ca " y ' R9Ure 1 ° (a) Sh ° WS ^ 0ri9ina ' ima9e ' R9ure 10 < b ) ^ a reconstruction from 
pactions of subregions into twelve dimensional principal component space and Figure 10(c) shows a recon- 

T f afS ! . CCd ? qUan ' iZed V6rSi0n ° f Fi9Ure 10(b) ' (Note tnat imaQes < b > ™* (O do not show all 
led i Z of « ZT ■ Ra,her ;' hey Sh ° W ' he reconstructi ° ns <™ encoding with subregions aligned with a 
hied grid of 56x56 face regions. S,multaneous encodings capture further image information as the subregions are 
offset relative to the region grid.) Following the quantization step, the probability expression (5) is further simplified 



V su bregi ons 

nif«*M) (6) 



I^rttrT- aPPearan " and posi,ion - At this P° int in th e chain of simplifications of the probability distribution, 

eqTEouSn SFJ , l ° 7T inC ' Ude b ° ,h the PiXe ' Pa,,em ° f 3 SUbre9ion and i,s positi ° n ^ ^e face 
region, bquation (6) is replaced with 



Psu bregi oru 

H p {face\ql,pos t ) ( 7 ) 



where each subregion is now represented by its quantization value and its position within the face region. Inter- 
pretation of expression (7) intuitively leads to thoughts like the following: eye-like patterns ought to occur in face 
regions only in the subregions likely to contain eyes. 

™Sr, P ° Sit i° n f T UnC °r on paltems - Given that 1,000,000 quantfcation levels and 196 positions are 
madp „ ,k S T"' fUrth6r simplifications ° f expres *°" (?) must occur. Two more simplifications are 
™°, th ' eXPreSS '°r n F ; St ' adecisionistakentoencode 'he positionaldependenoe of only the most commonly 
mTheTa^ 

elrpH h f ' ' S ° rt be '° W 30 ° CCUrrenCe threSh0ld Wi " have ,heir P° sitional dependence 

exoS lealri H P ,° S ° na ' diStribU ' i0n - " Umber ° f 91 PatternS Wh0Se posi,ional distribution is to be 
exphctUy learned during training is an algorithm sizing parameter referred to as "n^" in the Schneiderman refer- 
ence. For the uncommon patterns, expression (7) becomes 
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ttsubi 




npos 



(8) 



where npos= 196 is the number of possible subregion positions. 

9. Vector quantize positional dependence for common patterns. The second simplification to expression (7) 
involves a further reduction in the number of positional distributions learned during training. Already, the simplifi- 
cation of section 8 has reduced the number of positional distributions to be learned from n p1 to n est . Now, a further 
reduction from n est to n p2 will be performed by vector quantizing the nest surviving positional distributions into n p2 
representative distributions. For purposes of this quantization, the two dimensional positional distributions of the 
q1 patterns are rasterized into vectors. The number n p2 is an algorithm sizing parameter 

The vector quantization training algorithm is not the standard LBG algorithm, but rather an ad hoc custom 
algorithm, performed on a single pass through the input vectors. This single-pass nature is important, since the 
training algorithm wilt likely be quantizing tens or hundreds of thousands of vectors, and therefore must show 
concern for speed. The training process is outlined as follows: 



For each vector x 

Find the closest cun-ent pattern center 

Calculate the distance d between x and the closest center. The sum 
squared error (SSE) metric is used. 
If d<threshold 



For this algorithm to function properly, it must of course handle empty clusters gracefully, and also deal with the 
imposition of a maximum number n p2 of clusters. The cluster centers are computed as the average (ideally, weight- 
ed average by occurrence count) of the vectors that map to the cluster. The selection of the distance threshold is 
problematic and based essentially on empirical observation of the behavior of the quantization training when using 
different values of the threshold. The goal of this selection is to make full use of the available number of quantization 
levels while spreading out the distribution vectors as uniformly as possible. 

Upon application of the vector quantization of positional distributions, the position pos in expression (7) is 
mapped to one of the VQ pattern centers, identified as pos\ Equation (7) then becomes, for more common patterns, 



10. Apply detection at multiple resolutions, assuming independence between resolutions. Since the statis- 
tical dependencies between subregions cannot be captured in the simplified probability model that has been de- 
veloped, features larger than subregions cannot be considered. To overcome this limitation, multiple levels of 
image resolution are now introduced. The entire mechanism of the probability estimator in (2) will be applied to 
multiple levels of image resolution, leading to 



Add x to cluster; update cluster center 



else 



Seed new cluster with x 



til 




(9) 
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nmags nsuhs 



Y[P{face\ql{ ) (10 ) 



A typical example would be that of a single face captured at nmags=3 levels of pixel resolution. At each resolution 
the eyes must reside at standard positions. ««iuuon, 

E th FU " f ° r t m °\ ! imP ' ified probabi,i,v distribution. Gathering together expressions (8) and (10), and applying 

S , 0 ,h m, ° Pr ° babilitieS 9a,hered dUrin9 ,raining ,0 the P° s,erior P robabiliti <* ne e express ons 

leads to the full form (11) of the estimated likelihood of face presence in an image region. Details of the complete 
denvat.cn of this equation appear in the Schneiderman reference. 



P(face\regio n )=ff Y\P(ql J \face) p (pos'\ql i ! ,face)P (face^ 

- " ' MW p(/flcej+ ^ p(/ - 



(u) 



npos np QS 



clnS Z T\l ( V 7 3Ce) r6PreSent ,he Pri0f Pr ° babili,ieS ,hat an ima 9 e ^ does or does not 
n The aho f.T/ th ' S kn ° W ' ed9e ' ^ ^ ^ t0 * are used ' leadin 9 to > f^her simplification 

when use for pattern recogmtion of faces. Rather, it results in the presence of a scaling factor that must be taken into 
account when interpreting the algorithm output as a probability value 

22 Jr"",;" 9 f te , PS " PhaSe '• WhHe aCtUa ' ,fainin9 ° f a ' 90rithm S invo,ves a "umber of discrete steps, the training 
o Z Z " °t W ° T Ph3SeS - The 9031 °' ' he firet Phase is to obtain s P ecific P aramet ^ <**» quantization 
from ,h^ ra 9 t a 6 m,t ' a, f* ep is 10 ca P ture the ^variance matrix and then principa. components of the subregions 
^training set. As part of this step, following extraction of the principa, components, another pass is made through 

ieln 1 " ,n9 1 9 '°T 10 93ther S,atiSUCS ° f th6ir Pr ° jeCti0nS Un, ° th0se «* elve tensions. The pro 

of resoM nn , X f ' ^ VariaU ° n ° f faCe pattemS iS quite lar 9 e when conside ^d across different scale 

of resolution, this process of extracting principal components and the statistical distribution of the training data along 

those components must be repeated for each image resolution ^ 
En ZTl S ! 6PS Ph ? 6 " The SeC ° nd PhaSe ° f training starts bv P assi "9 ^ugh 'he training set and per- 
^Treat IhT r ea , ChSU ^ 

?rZS? I f perturbed versions of each training exemplar. The frequency with which quantized values appear 

*Zo~2 a ^T m > :T 9 :° U f ' y 1 '° 00 ' 000 binS - Simultaneous| V- ^"region positions at which each quantized 
^ arS ac c cumulated - sort operation arranges the quantization frequency histogram in decreasing order of 
~ T 1 -,^ t c he „ nest ™ st fre ^ency quantized patterns, the positional distributions enter into I vector 
quantization algorithm. Following vector quantization, only n q2 seminal positional distributions are retained, and each 
raS, 1 ? q^ntizahon values will have a positional distribution approximated by the retained distributions 
I?nrIL n „ P , P n" 19 , k ,ace ,. detector - To use ,he trained 'ace detection algorithm at test time, the computation of 
express,on(i )mu st eapphedtoan image region on which spatial and intensity normalization have been conducted 
oion 1 .1 h 80 hT V6rSi0nS ° f 6aCh Candida ' e ,aCe re9i0n are re « uired - The " uanfea «°" value for each subre 
SgaZri^ 

[0058] To use expression (11) for face detection, a probability threshold must be selected. When the posterior orob- 

oleSThT H? re r ld ' ,h6n ^ de,eCti ° n haS ° CCUrred - algorithm « rai -9 Process hafZcl 

" ' de,e ™ ned bv stud V in 9 classification performance of the algorithm when applied to a ver- 

ntof " ?ifh ,T "° Ce ima9eS - The threSh °' d iS Set fo^op,i ' I,a, P^ormance on the verification set, taking 
into account the relabve importance of false positive and false negative errors. 
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Dual screening face detector - the combined algorithms 

[0059] In the preferred face detection algorithm of the invention, those face candidates generated by the component 
W become input to the face detector of the component S. Since the window shapes of the two algorithms are slightly 
different, a spatial affine transformation serves to frame the face candidate and place the eyes in standard position for 
the component S. A threshold is applied to the output from the component S to declare the presence of a face in an 
image window. 

[0060] Since the component W examines the image at a range of scales in a window that is scanned across the 
entire image, it is likely that a true face might be detected at more than one scale, and at several closely spaced window 
positions. Some method for combination of detection overlaps must be employed. Two different methods were tested. 
The first method simply used the strongest detection from a spatially overlapping group of detections. The second 
method computes the average eye locations of the overlapping detections. It was found empirically that the averaging 
technique resulted in more accurate eye positions, as judged visually by a human observer. 



Claims 

1. A digital camera for capturing an image of a scene, said digital camera comprising: 

a capture section for capturing an image and producing image data; 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; 

face data means associated with the processing section for generating face data corresponding to attributes 

of at least one of the faces in the image; 

a storage medium for storing the image data; and 

recording means associated with the processing section for recording the face data with the image data on 
the storage medium. 

2. The digital camera as claimed in claim 1 wherein the face data corresponds to at least one of the location, orien- 
tation, scale and pose of at least one of the faces in the image. 

3. The digital camera as claimed in claim 1 wherein the electronic processing section further provides an indication 
that one or more faces have been detected. 

4. The digital camera as claimed in claim 3 further comprising a framing device for framing the image, and wherein 
the electronic processing section provides an indication in the framing device identifying the one or more faces 
that have been detected. 

5. The digital camera as claimed in claim 4 wherein the framing device is either an optical viewfinder that images the 
scene or an electronic display device that reproduces the image data. 

6. The digital camera as claimed in claim 1 wherein the recording means records the captured image data in the 
storage medium in digital folders dedicated to images with a particular number of faces in the scenes. 

7. The digital camera as claimed in claim 1 wherein the electronic processing section further includes a face recog- 
nition algorithm and a data base of known faces for generating facial identities, and wherein the recording means 
labels one or more images in the storage medium with the facial identities of known faces. 

8. The digital camera as claimed in claim 1 wherein the capture section further includes an exposure control section 
responsive to the presence of one or more faces for optimally exposing the image for at least one of the faces in 
the scene. 

9. The digital camera as claimed in claim 8 wherein the exposure control section optimally exposes the image for 
either the preponderance of faces in the scene or the largest face in the scene. 

10. The digital camera as claimed in claim 8 wherein the capture section further includes a flash unit, and wherein the 
electronic processing section controls activation of the flash unit in order to optimize exposure for at least one of 
the faces in the scene. 
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11. A digital camera for capturing an image of a scene, said digital camera comprising: 

a capture section for capturing an image and producing image data- 

an electronic processing section for processing the image data to determine the presence of one or more 
races in the scene; 

face data means associated with the processing section for generating face data corresponding to at least 

one of the location, orientation, scale and pose of at least one of the faces in the image- 

a composition algorithm associated with the processing section for processing the face'data and generating 

composition suggestions for a user of the digital camera; and 

a display device for displaying the composition suggestions to the user. 

12. A digital camera as claimed in claim 1 1 wherein the composition suggestions include at least one of (a) an indication 
that a mam subject is too small in the image, (b) an indication that following the law of thirds will lead to a more 
pleasing composition, (c) an indication that one or faces have been cut off in the image, and (c) an indication that 
a horizontal alignment of subjects should be avoided in the image. 

13. A digital camera for capturing an image of a.scene, said digital camera comprising: 

a capture section for capturing an image and producing image data- 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene and generating face data therefrom; 

Z^l* a ' 90rith r aS !° Ciated With ' he pf ° CeSSin9 Sec,i0n for 9eneratin 9 ° rienta,i °" data indicating ori- 
entation of the image based on the orientation of at least one of the faces in the image- 
a storage medium for storing the image data; and 

recording means associated with the processing section for recording the orientation data with the image data 
on the storage medium. y 

14. A digital camera for capturing an image of a scene, said digital camera comprising: 

a capture section for capturing an image and producing image data- 

an electronic processing section for processing the image data to determine the presence of one or more 
races in the scene; 

l^rT I 6 ' 6 '" 0 " a ' 9 ° rit « hm a$S0Cia,ed Wi,h ,he electr0nic processin 9 sec,ion f°r generating red eye signals 
indicating the presence of red eye in one or more of the faces- and 

camera y deV ' Ce reSP ° nS ' Ve '° the red 6ye Si 9 nals ,or dis P |a y in 9 » ^ eye warning to a user of the digital 

15 ' 31 Camefa 35 C J aimed in C ' aim 14 fUfther C ° mprising red eye correction means ^sponsive to the red eye 

signals for correcting the red eye in said one or more faces. 

16 " ^ttZ\!°^TT 9 a " ima9e °' 3 SC6ne 00 b0th a " electronic medium and a « lm medi ™ ^ving a 
magnetic layer, said hybrid camera comprising: 

an image capture section for capturing an image with an image sensor and producing image data- 
means for capturing the image on the film medium; 

fecelTnthfsc^ SeCti ° n Pr ° CeSSing the lmage data t0 determlne the P^sence of one or more 

face data means associated with the electronic processing section for generating face data corresponding to 
at least one of the location, scale and pose of at least one of the faces in the image- and 
means for writing the face data on the magnetic layer of the film medium. 

17. The hybrid camera as claimed in claim 16 further comprising: 



a storage medium for storing the image data; and 
recording means associated with the processing < 
the storage medium. 

18. The hybrid camera as claimed in claim 16 wherein the electronic processing section provides an indication that 



recording means associated with the processing section for recording the face data with the image data on 
me storage medium. 



16 



EP 1 128 316 A1 



one or more faces have been detected. 

19. The hybrid camera as claimed in claim 18 further comprising a framing device for framing the image, and wherein 
the electronic processing section provides an indication in the framing device identifying the one or more faces 
that have been detected. 

20. The hybrid camera as claimed in claim 16 wherein the electronic processing section further includes a face rec- 
ognition algorithm and a data base of known faces for generating facial identities, and wherein the recording means 
records the facial identities of known faces on the magnetic layer of the film medium. 

21. The hybrid camera as claimed in claim 16 further including: 

orientation data means associated with the electronic processing section for generating orientation data indi- 
cating orientation of the image based on the orientation of at least one of the faces in the image; 
and wherein said recording means associated with the processing section records the orientation data on the 
magnetic layer of the film medium. 

22. The hybrid camera as claimed in claim 16 further comprising an exposure control section responsive to the pres- 
ence of one or more faces for optimally exposing the film medium for at least one of the faces in the scene. 

23. The hybrid camera as claimed in claim 22 further comprising a flash unit, and wherein the exposure control section 
controls activation of the flash unit in order to optimize exposure for at least one of the faces in the scene. 

24. A hybrid camera for capturing an image of a scene on an electronic medium and on a film medium, said hybrid 
camera comprising: 

an image capture section for capturing an image with an image sensor and producing image data; 
means for capturing the image on a film medium; 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; 

face data means associated with the processing section for generating face data corresponding to at least 
one of the location, orientation, scale and pose of at least one of the faces in the image; 
composition means associated with the processing section for generating composition suggestions for a user 
of the hybrid camera; and 

a display device for displaying the composition suggestions to the user. 

25. A hybrid camera as claimed in claim 24 wherein the composition indicating aids include at least one of (a) an 
indication that a main subject is too small in the image, (b) an indication that following the law of thirds will lead to 
a more pleasing composition, (c) an indication that one or faces have been cut off in the image, and (c) an indication 
that a horizontal alignment of subjects should be avoided in the image. 

26. A hybrid camera for capturing an image of a scene on an electronic medium and on a film medium, said hybrid 
camera comprising: 

an image capture section for capturing an image with an image sensor and producing image data; 
means for capturing an image and producing image data; 

an electronic processing section for processing the image data to determine the presence of one or more 
faces in the scene; 

red eye detection means associated with the processing section for generating red eye signals indicating the 
presence of red eye in one or more of the faces; and 

a display device responsive to the red eye signals for displaying a red eye warning to a user of the hybrid 
camera that another image should be captured. 

27. A method for determining the presence of a face from image data, said method comprising the steps of: 

(a) prescreening the image data with a first algorithm to find one or more face candidate regions of the image 
based on a comparison between facial shape models and facial probabilities assigned to image pixels within 
the region; and 
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(b) operating on the face candidate regions with a second algorithm using a pattern matching technique to 
represent each face candidate region of the image and thereby confirm a facial presence in the region, whereby 
the ambulation of these algorithms provides higher performance in terms of detection levels than either al- 
gorithm individually. 

28. A method for verifying the presence of red eye in an image, said method comprising the steps of: 

providing image data representative of the image; 

processing the image data with a red eye detection algorithm for generating red eye signals indicating the 
presence of red eye in the image; and 

corroborating the existence of red eye by utilizing a face detection algorithm to verify that the red eye signals 
correspond to the presence of one or more faces in the image. 

29. The method as claimed in claim 28 further comprising the step of correcting the red eye in the image. 
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